Skip to content

Conversation

@maleadt
Copy link
Member

@maleadt maleadt commented Feb 7, 2026

We were conflating ghost values (e.g. nothing) with non-ghost things that shouldn't be emitted (e.g. Val instances). This came from a deeper issue where codegen had to unpack Val and Constant wrappers, which shouldn't be required as these have language-level operations to unpack (typevars, getindex). To avoid codegen having to do this, make the intrinsics take constants instead of type-wrapped values. While touching this code, also switch constants to being emitted lazily.

One disadvantage of this approach is that it relies heavily on constant-propagation at the Julia level to infer types from value inputs, hence the @constprop :aggressive. I considered using explicit tfuncs as an alternative, combined with intrinsics returning Any, but that didn't fully work because the inferred rettyp from the body kept leaking in. We also had to keep bodies for now because of JuliaLang/julia#60583. Maybe something to revisit later.

EDIT: revisited; see below.

Fixes #77.

maleadt and others added 4 commits February 7, 2026 10:04
Extract Val type parameters directly via argextype at each intrinsic
codegen site, rather than having get_constant implicitly unwrap Val
and Constant type parameters. This keeps language-level concerns out
of generic codegen infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@maleadt maleadt force-pushed the tb/global_constants branch from 0afd872 to eb57664 Compare February 7, 2026 09:09
@maleadt
Copy link
Member Author

maleadt commented Feb 7, 2026

Okay, I actually got tfuncs to work reasonably. It's a little more verbose, but I think it's a clearer concept rather than relying on value-based semantics and hoping things get constant propagated by Julia.

In fact, I tried to take it one step further and introduce efuncs to model the effects of our intrinsics, but Julia's compiler doesn't have a great way to spoof effects (see the note in intrinsics.jl). Here's the WIP:

diff -ruN src/compiler/interface.jl src/compiler/interface.jl
--- src/compiler/interface.jl	2026-02-07 20:19:24.423839322 +0100
+++ src/compiler/interface.jl	2026-02-07 20:20:13.241318532 +0100
@@ -66,7 +66,31 @@
 CC.unlock_mi_inference(::cuTileInterpreter, ::MethodInstance) = nothing

 # Setup caching - generates cache_owner and ipo_dataflow_analysis! methods
-@setup_caching cuTileInterpreter.cache
+# Caching setup — replaces @setup_caching to add per-intrinsic effect overrides.
+# @setup_caching generates cache_owner + finish!; we define both manually so we
+# can modify ipo_effects in finish! before the base method encodes them into
+# the CodeInstance's ipo_purity_bits.
+CC.cache_owner(interp::cuTileInterpreter) =
+    CompilerCaching.cache_owner(interp.cache)
+
+function CC.finish!(interp::cuTileInterpreter, caller::CC.InferenceState,
+                    validation_world::UInt, time_before::UInt64)
+    CC.stack_analysis_result!(caller.result, CuTileResults())
+    # Apply per-intrinsic effect overrides to ipo_effects before the base
+    # finish! encodes them into ipo_purity_bits on the CodeInstance.
+    specTypes = caller.linfo.specTypes
+    if specTypes isa DataType
+        ftype = specTypes.parameters[1]
+        if isdefined(ftype, :instance)
+            override = _efunc(ftype.instance, caller.result.ipo_effects)
+            if override !== nothing
+                caller.result.ipo_effects = override
+            end
+        end
+    end
+    @invoke CC.finish!(interp::CC.AbstractInterpreter, caller::CC.InferenceState,
+                        validation_world::UInt, time_before::UInt64)
+end

 # Optimization flags
 CC.may_optimize(::cuTileInterpreter) = true
@@ -83,6 +107,15 @@
 # Intrinsics module exists).
 tfunc(@nospecialize(f), argtypes::Vector{Any}) = nothing

+# Per-intrinsic effect overrides using multiple dispatch.
+# Returns nothing when no override applies (fallback).
+# Concrete per-intrinsic methods are defined in intrinsics/ for
+# side-effectful operations (stores, atomics).
+_efunc(@nospecialize(f), effects::CC.Effects) = nothing
+
+# Check if a function is defined in the Intrinsics module.
+_is_intrinsic(@nospecialize(f)) = isa(f, Function) && parentmodule(f) === Intrinsics
+
 #=============================================================================
  Subprogram inference for reduce/scan
 =============================================================================#
@@ -174,7 +207,7 @@
             sv::CC.InferenceState, max_methods::Int)
         rt_override = tfunc(f, arginfo.argtypes)
         subprog = _infer_subprogram(interp, f, arginfo, si, vtypes, sv)
-        rt_override === nothing && subprog === nothing && return result
+        rt_override === nothing && subprog === nothing && !_is_intrinsic(f) && return result
         wrapped = CC.Future{CC.CallMeta}()
         push!(sv.tasks, function (interp′, sv′)
             isready(result) || return false
@@ -182,8 +215,10 @@
             cm = result[]
             sp = subprog !== nothing ? subprog[] : nothing
             rt = rt_override !== nothing ? rt_override : cm.rt
+            effects_override = _efunc(f, cm.effects)
+            effects = effects_override !== nothing ? effects_override : cm.effects
             info = sp !== nothing ? SubprogramCallInfo(cm.info, sp.info) : cm.info
-            wrapped[] = CC.CallMeta(rt, cm.exct, cm.effects, info, cm.refinements)
+            wrapped[] = CC.CallMeta(rt, cm.exct, effects, info, cm.refinements)
             return true
         end)
         return wrapped
@@ -197,7 +232,7 @@
             sv::CC.InferenceState, max_methods::Int)
         rt_override = tfunc(f, arginfo.argtypes)
         subprog = _infer_subprogram(interp, f, arginfo, si, nothing, sv)
-        rt_override === nothing && subprog === nothing && return result
+        rt_override === nothing && subprog === nothing && !_is_intrinsic(f) && return result
         wrapped = CC.Future{CC.CallMeta}()
         push!(sv.tasks, function (interp′, sv′)
             isready(result) || return false
@@ -205,8 +240,10 @@
             cm = result[]
             sp = subprog !== nothing ? subprog[] : nothing
             rt = rt_override !== nothing ? rt_override : cm.rt
+            effects_override = _efunc(f, cm.effects)
+            effects = effects_override !== nothing ? effects_override : cm.effects
             info = sp !== nothing ? SubprogramCallInfo(cm.info, sp.info) : cm.info
-            wrapped[] = CC.CallMeta(rt, cm.exct, cm.effects, info, cm.refinements)
+            wrapped[] = CC.CallMeta(rt, cm.exct, effects, info, cm.refinements)
             return true
         end)
         return wrapped
@@ -220,9 +257,13 @@
             sv::CC.AbsIntState, max_methods::Int)
         _infer_subprogram(interp, f, arginfo, si, nothing, sv)  # side-effect only
         rt_override = tfunc(f, arginfo.argtypes)
-        if rt_override !== nothing
-            return CC.CallMeta(rt_override, result.exct, result.effects,
-                               result.info, result.refinements)
+        effects_override = _efunc(f, result.effects)
+        if rt_override !== nothing || effects_override !== nothing
+            return CC.CallMeta(
+                rt_override !== nothing ? rt_override : result.rt,
+                result.exct,
+                effects_override !== nothing ? effects_override : result.effects,
+                result.info, result.refinements)
         end
         return result
     end
diff -ruN src/compiler/intrinsics/atomics.jl src/compiler/intrinsics/atomics.jl
--- src/compiler/intrinsics/atomics.jl	2026-02-07 20:19:24.428020571 +0100
+++ src/compiler/intrinsics/atomics.jl	2026-02-07 20:21:03.798317824 +0100
@@ -41,10 +41,11 @@
     """
     @noinline function atomic_cas(array::TileArray{T, N}, index, expected, desired,
                                    memory_order::Int, memory_scope::Int) where {T, N}
-        donotdelete()
         compilerbarrier(:const, zero(T))::T
     end
 end
+_efunc(::typeof(Intrinsics.atomic_cas), effects::CC.Effects) =
+    CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
 function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_cas), args)
     cb = ctx.cb
     tt = ctx.tt
@@ -179,10 +180,11 @@
     """
     @noinline function atomic_xchg(array::TileArray{T, N}, index, val,
                                     memory_order::Int, memory_scope::Int) where {T, N}
-        donotdelete()
         compilerbarrier(:const, zero(T))
     end
 end
+_efunc(::typeof(Intrinsics.atomic_xchg), effects::CC.Effects) =
+    CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
 function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_xchg), args)
     emit_atomic_rmw!(ctx, args, AtomicXCHG)
 end
@@ -198,10 +200,11 @@
     """
     @noinline function atomic_add(array::TileArray{T, N}, index, val,
                                    memory_order::Int, memory_scope::Int) where {T, N}
-        donotdelete()
         compilerbarrier(:const, zero(T))
     end
 end
+_efunc(::typeof(Intrinsics.atomic_add), effects::CC.Effects) =
+    CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
 function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.atomic_add), args)
     emit_atomic_rmw!(ctx, args, AtomicADD)
 end
diff -ruN src/compiler/intrinsics/memory.jl src/compiler/intrinsics/memory.jl
--- src/compiler/intrinsics/memory.jl	2026-02-07 20:19:24.429705165 +0100
+++ src/compiler/intrinsics/memory.jl	2026-02-07 20:21:09.960317738 +0100
@@ -95,10 +95,11 @@
     @noinline function store_ptr_tko(ptrs::Tile{Ptr{T}, S}, values::Tile{T, S},
                                       latency::Union{Int, Nothing},
                                       mask::Union{Tile{Bool, S}, Nothing}=nothing) where {T, S}
-        donotdelete()
-        nothing
+        compilerbarrier(:const, nothing)
     end
 end
+_efunc(::typeof(Intrinsics.store_ptr_tko), effects::CC.Effects) =
+    CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
 function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.store_ptr_tko), args)
     cb = ctx.cb
     tt = ctx.tt
diff -ruN src/compiler/intrinsics/views.jl src/compiler/intrinsics/views.jl
--- src/compiler/intrinsics/views.jl	2026-02-07 20:19:24.431298839 +0100
+++ src/compiler/intrinsics/views.jl	2026-02-07 20:21:15.551317660 +0100
@@ -378,10 +378,11 @@
                                              latency::Union{Int, Nothing},
                                              allow_tma::Bool,
                                              indices::NTuple{M, <:Integer}) where {T, N, Shape, M}
-        donotdelete()
-        nothing
+        compilerbarrier(:const, nothing)
     end
 end
+_efunc(::typeof(Intrinsics.store_partition_view), effects::CC.Effects) =
+    CC.Effects(effects; effect_free=CC.ALWAYS_FALSE)
 function emit_intrinsic!(ctx::CGCtx, ::typeof(Intrinsics.store_partition_view), args)
     cb = ctx.cb
     tt = ctx.tt
diff -ruN src/compiler/intrinsics.jl src/compiler/intrinsics.jl
--- src/compiler/intrinsics.jl	2026-02-07 20:19:24.426082161 +0100
+++ src/compiler/intrinsics.jl	2026-02-07 20:20:45.284318084 +0100
@@ -4,7 +4,7 @@

 module Intrinsics

-using Base: compilerbarrier, donotdelete
+using Base: compilerbarrier
 using ..cuTile: Tile, TileArray, Constant, TensorView, PartitionView
 using ..cuTile: Signedness, SignednessSigned, SignednessUnsigned
 using ..cuTile: ComparisonPredicate, CmpLessThan, CmpLessThanOrEqual, CmpGreaterThan, CmpGreaterThanOrEqual, CmpEqual, CmpNotEqual

@maleadt maleadt force-pushed the tb/global_constants branch from 8cef2fb to 0cdb2aa Compare February 7, 2026 19:34
@maleadt maleadt marked this pull request as ready for review February 7, 2026 20:57
@maleadt maleadt changed the title Fix constants and switch intrinsics to constant value inputs Fix constants and switch intrinsics to constant value inputs + tfuncs Feb 8, 2026
@maleadt maleadt merged commit bc58f9d into main Feb 8, 2026
8 checks passed
@maleadt maleadt deleted the tb/global_constants branch February 8, 2026 06:31
@vchuravy
Copy link
Member

vchuravy commented Feb 9, 2026

Idea: Instead of turning the Expr(:call into an Expr(:invoke for our custom builtin, use a NoCallInfo during the abstract_call_known return Future(CallMeta(rt, exct, effects, NoCallInfo(), refinements))

@maleadt maleadt mentioned this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using constants generates compilation error

2 participants